GMM-Based Missing-Feature Reconstruction on Multi-Frame Windows
نویسندگان
چکیده
Methods for missing-feature reconstruction substitute noisecorrupted features with clean-speech estimates calculated based on reliable information found in the noisy speech signal. Gaussian mixture model (GMM) based reconstruction has conventionally focussed on reliable information present in a single frame. In this work, GMM-based reconstruction is applied on windows that span several time frames. Mixtures of factor analysers (MFA) are used to limit the number of model parameters needed to describe the feature distribution as window width increases. Using the window-based MFA in noisy speech recognition task resulted in relative error reductions up to 52 % compared to frame-based GMM.
منابع مشابه
Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors
We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective lowlevel spatio-temporal features which efficiently capture relevant local dynamics. Features from e...
متن کاملSparse Reconstruction of Multi-Window Time-Frequency Representation Based on Hermite functions
Multi-window spectrograms offer higher energy concentration in contrast to the traditional single-window spectrograms. However, these quadratic time-frequency distributions were not introduced to deal with randomly undersampled signals. This paper applies sparse reconstruction techniques to provide time-frequency representations of nonstationary signals using the Hermite functions as multiple w...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملScalable distributed speech recognition using Gaussian mixture model-based block quantisation
In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen–Loève transform of the block quantiser. Comp...
متن کاملTime-dependent cross-probability model for multi-environment model based LInear normalization
In a previous work, Multi-Environment Model based LInear Normalization, MEMLIN, was presented and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs). In this algorithm, the probability of the clean model Gaussian, given the noisy model one and the noisy featur...
متن کامل